Skip to content

feat(transfers): add NM_Wells 1:1 mirror transfer (BDMS-945)#740

Merged
jirhiker merged 44 commits into
stagingfrom
BDMS-945-1-1-Mirror-Transfer
Jun 24, 2026
Merged

feat(transfers): add NM_Wells 1:1 mirror transfer (BDMS-945)#740
jirhiker merged 44 commits into
stagingfrom
BDMS-945-1-1-Mirror-Transfer

Conversation

@jirhiker

Copy link
Copy Markdown
Member

Summary

This PR is an implementation and proof of concept of Jake's PR (#686) for running the 1:1 migration of the prioritized tables in the NM_Wells database. It also creates a set of materialized views and OGC layers that are translated from views the geothermal staff most commonly use.

Details

  • Adds 18 NMW_* staging mirror tables (1:1 copy of NM_Wells SQL Server tables) with FK constraints
  • Adds 6 new OGC API collections: geothermal_wells_bht, geothermal_wells_temperature_profile, bht_measurements, temp_depth_measurements, heat_flow, dst
  • Adds transfer pipeline (python -m transfers.transfer_geothermal) to load data from CSV exports or SQL dump

Migrations

3 consolidated migrations (was 14 during development):

  1. c0d1e2f3a4b5 — all NMW_* tables + FK constraints
  2. d1e2f3a4b5c6 — per-well aggregate OGC views (BHT, temperature profile, heat flow summaries)
  3. e2f3a4b5c6d7 — individual-row OGC views (BHT measurements, temp-depth, heat flow, DST)

Data load

After migrating: python -m transfers.transfer_geothermal
Tables must load in parent→child order (enforced in NMW_MIRROR_SPECS).
After load, the ogc_geothermal_wells_temperature_profile materialized view refreshes automatically.

Reviewer notes

  • core/pygeoapi.py is unchanged from staging (reverted to original)
  • NMW_WellRecords.SourceID is a free-text citation string, not a numeric FK — NMW_Sources joins on it as text

Notes

  • I

Supersedes #738 (auto-closed when its branch feature/BDMS-826-NMW-migrations-core was renamed to BDMS-945-1-1-Mirror-Transfer). Adds: NM_Wells mirror tests (tests/test_nmw_mirror.py), CAST-unwrap fix in nmw_sql_dump.py, and docs/nm_wells-migration.md.

jirhiker and others added 30 commits June 6, 2026 16:20
Phase 1 of the NM_Wells -> Ocotillo migration: faithful column-for-column
staging mirror of the legacy NM_Wells SQL Server DB, plus loaders. The
transform into the Ocotillo model (Phase 2) is documented inline but not built.

- db/nmw_legacy.py: 17 NMW_* mirror models (5 Main, 7 Geothermal, 5 DST),
  source column names preserved, per-column Phase-2 transform-target notes.
  Main columns from the planning workbook field map; Geothermal/DST columns,
  lengths and PKs taken directly from the SQL-dump DDL.
- alembic: two migrations (Main; Geothermal+DST) chained off current head,
  bodies generated from model metadata. Single head.
- transfers/nmw_mirror_transfer.py: data-driven CSV -> NMW_* loader with type
  coercion (NaN/NaT -> None, rowversion dropped), chunked ON CONFLICT upsert.
  Gated by TRANSFER_NMW_MIRROR (default off; separate source DB).
- transfers/reference_lexicon_transfer.py: loads all 49 ref_* lookups into the
  lexicon (category per table), idempotent like init_lexicon; registered as a
  foundational transfer.
- db/__init__.py, transfers/transfer.py, .env.example: wiring.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- nmw_mirror_transfer: parse DateTime values with pd.to_datetime(errors=coerce)
  since read_csv does not parse_dates (avoids driver-dependent insert failures).
- db/nmw_legacy: fix attribute typos (dst_operator, recov_column, resistivity)
  while preserving the legacy DB column names; fix latitude_dd27 comment typo.
- reference_lexicon_transfer: correct stale exclusion comment (ref_date_drilled
  is included; only ref_nm_quads is excluded).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The SSMA_TimeStamp column is a SQL Server rowversion artifact with no value as
staging data (the loader already skipped it). Remove it from the NMW_* mirror
models and both migrations; drop the now-unused LargeBinary import.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Confirmed source PKs from the NM_Wells SQL dump DDL:
- WellHeaders/WellRecords/WellSamples have declared PRIMARY KEY constraints
  (WellDataID / RecrdSetID / SamplSetID) matching the models.
- WellLocations and WellZDatum declare no PK, only unique indexes on OBJECTID
  and GlobalID. Switch WellZDatum PK from GlobalID to OBJECTID for consistency
  with WellLocations and safety (OBJECTID identity is never NULL; the GlobalID
  unique index permits one NULL). Update the migration accordingly.

Remove the TODO(verify) note; PKs are now confirmed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add transfers/nmw_sql_dump.py: streams INSERT [dbo].[tbl_*] (...) VALUES (...)
statements out of a SQL Server data-dump .sql file, yielding {column: value}
dicts. Handles N'...' / escaped '', embedded commas/parens, CAST(expr AS type),
multi-row VALUES, 0x binary -> None, and UTF-16/UTF-8 (BOM auto-detect).

Refactor transfer_nmw_mirror to be source-agnostic: when NMW_SQL_DUMP points at
a .sql data dump it loads from there, otherwise falls back to per-table CSVs.
Same model-driven type coercion and chunked ON CONFLICT upsert for both.

Note: the provided NMWells.sql is schema-only; NMW_SQL_DUMP expects a separate
data dump containing INSERT statements.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…er.py

Move the NM_Wells (geothermal) orchestration out of transfers/transfer.py into a
new standalone transfers/transfer_geothermal.py. Revert all NM_Wells wiring from
transfer.py and mark that module deprecated (module docstring + DeprecationWarning
in transfer_all) so new migrations get their own orchestrator.

transfer_geothermal.py runs the reference->lexicon load
(TRANSFER_GEOTHERMAL_REFERENCE) and the NMW_* mirror load (TRANSFER_NMW_MIRROR);
both default on. Run: python -m transfers.transfer_geothermal.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
reference_lexicon_transfer now selects its row source the same way as
nmw_mirror_transfer: a SQL Server data dump when NMW_SQL_DUMP is set (parsed by
nmw_sql_dump.iter_table_rows), otherwise per-table CSV. _pick_columns operates
on a column-name list and rows are processed as dicts so both sources share one
path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add LEXICON_REF_BY_COLUMN mapping every coded mirror attribute to its ref_*
source table (which reference_lexicon_transfer loads as a lexicon category whose
rows become terms). These 40 attributes will become lexicon_term FKs / enums in
the Phase-2 transform. Add LEXICON_CANDIDATES_NO_REF for 8 coded columns that
have no ref_* table and will need a new category/enum (DrillFluid, TestType,
Operation, etc.). Validated: every column + ref table exists.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Remove the dead `category = table[4:]` line and fix the stale docstring; the
category is nmw_<table> (e.g. nmw_ref_states).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ile)

Two pygeoapi point layers over the NMW_* staging mirror, geometry from
NMW_WellLocations Lat/Long_dd83:

- ogc_geothermal_wells_bht: one feature per geothermal well with bottom-hole
  temperature data (NMW_GtBhtData), aggregate BHT stats.
- ogc_geothermal_wells_temperature_profile: one feature per geothermal well with
  a downhole temperature-vs-depth series (NMW_GtTempDepths) as an ordered JSON
  array.

Wells link via gt_*.SamplSetID -> NMW_WellSamples -> NMW_WellRecords ->
NMW_WellLocations. Guards required tables; drops views on downgrade.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The temperature-vs-depth profile view scans/groups NMW_GtTempDepths (~370k
source rows) and builds a per-well JSON series — too heavy to recompute per
pygeoapi request. Convert it to a MATERIALIZED view with a unique index on
well_data_id (enables REFRESH CONCURRENTLY) and a GiST index on geom. The BHT
view stays a regular view (small source). REFRESH after a data reload.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pygeoapi point layer ogc_geothermal_wells_heat_flow: one feature per geothermal
well with summary heat-flow determinations (NMW_GtSumHeatFlow) - aggregate heat
flow, thermal gradient, thermal conductivity and quality. Geometry from
NMW_WellLocations; linked via NMW_GtSumHeatFlow.RecrdSetID -> NMW_WellRecords.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pygeoapi point layer ogc_geothermal_wells_interval_heat_flow from NMW_GtHeatFlow
(per-interval values: Q heat flow, gradient, Kpr conductivity, Ka diffusivity),
one feature per well. Distinct from ogc_geothermal_wells_heat_flow (summary,
NMW_GtSumHeatFlow). Linked via IntrvlGUID -> NMW_WsIntervals -> NMW_WellSamples
-> NMW_WellRecords -> NMW_WellLocations.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Rename ogc_geothermal_wells_heat_flow -> ogc_geothermal_wells_summary_heat_flow.
- Add a `measurements` JSON series to both heat-flow views: one element per
  determination/interval (depth range, heat flow, gradient, conductivity, etc.),
  ordered by depth, alongside the existing per-well aggregates.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When NMW_SQL_DUMP is set, the mirror now parses the dump with sqlparse
(nmw_sql_dump.write_table_csv) into a CSV per table, then bulk-loads each via
Postgres COPY ... FROM STDIN (truncate + COPY; Postgres casts text -> types) —
far faster than row-by-row ORM inserts. CSV dir defaults to a temp dir
(override NMW_CSV_DIR). The CSV-exports fallback (no dump) keeps the row-insert
path. Adds sqlparse dependency.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add refresh_materialized_views (REFRESH the geothermal materialized views,
currently ogc_geothermal_wells_temperature_profile; skip any not present). The
transfer_geothermal orchestrator calls it after the NMW_* mirror load so the
materialized view reflects the freshly loaded data.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Resolve requirements.txt conflict by regenerating from the merged uv.lock
(uv export). Brings staging fixes incl. the CLI test update.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
-generate csv files via script
-fix some data overflow problems
Add geothermal_wells_bht and geothermal_wells_temperature_profile to the
pygeoapi-config.yml template, exposing NM_Wells geothermal views as OGC
API - Features collections.

Fix _resolve() priority in pygeoapi.py so that shell PYGEOAPI_* env vars
(e.g. PYGEOAPI_POSTGRES_HOST=db set in docker-compose) take precedence
over generic POSTGRES_* values from .env, preventing localhost from
shadowing the Docker service hostname.

Add INSTALL_DEV=true build arg to docker-compose so faker and other dev
dependencies are present when the container runs in development mode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
adding migrations and views
Translates the legacy MSSQL BHT query to a PostgreSQL view returning one
row per individual BHT measurement (not aggregated per well). Joins
NMW_GtBhtData through headers, samples, records, Z-datum filter, well
headers, and locations. Exposes 5 063 features via /ogcapi/collections/bht_measurements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Translates the legacy MSSQL TempDepth2_SortedWellName query to a
PostgreSQL view returning one row per individual temperature-depth reading.
Includes both NAD27 and NAD83 coordinates, elevation datums (GL, unspc, KB),
and filters out excluded locations. Exposes 363 858 features via
/ogcapi/collections/temp_depth_measurements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a 1:1 staging mirror of the NM_Wells tbl_sources publication registry.
Wires it into the nmw_mirror_transfer MirrorSpec list so it loads with the
rest of the NMW tables. Data loads once tbl_sources.csv is exported from the
legacy SQL Server and placed in transfers/data/nma_csv_cache/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Translates the legacy MSSQL HeatFlow query to a PostgreSQL view. Includes
CASE WHEN unit conversions (ft->m, TCU->SI, HFU->mW/m²), publication
attribution from NMW_Sources, and a LEFT JOIN on Well_Z_Datum for elevation.
County WHERE filter removed in favour of API-level filtering. Exposes
1 522 features via /ogcapi/collections/heat_flow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Translates the legacy MSSQL DST query (which never executed in Access due to
a broken DST_flwHstryConcat saved query). Replaces the broken cross-join with
a string_agg() CTE over NMW_WsDstFlowHistory, concatenating operation
descriptions per interval. GROUP BY with no aggregates translated to
SELECT DISTINCT. Exposes 1 798 features via /ogcapi/collections/dst.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
peterrowland and others added 6 commits June 22, 2026 11:39
revert changes to pygeoapi
simplified migrations and change order of mirror transfers to start with well headers
Doc existed untracked in a worktree and docs/ is gitignored, so the
NM_Wells migration plan referenced 4x in db/nmw_legacy.py and
transfers/nmw_mirror_transfer.py pointed at a missing file. Force-add
to land it on this branch (matches existing docs/ force-add pattern).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
tests/test_nmw_mirror.py (19 tests) covers SPEC invariants V1 (18 mirror
tables + PK), V2 (FK parent loads before child in NMW_MIRROR_SPECS), V3 (8
OGC views built), V5 (temperature-profile view materialized), V6 (geothermal
pygeoapi collections back existing relations), V10 (DB-level FK constraints),
and the SQL-dump value parser.

Fixes B1/V11: _CAST_RE in transfers/nmw_sql_dump.py only matched AS-types
without parentheses, so CAST(x AS nvarchar(10)) / CAST(n AS Decimal(18,2))
left the value as a literal "CAST(...)" string. Widened to allow one paren
level in the type name.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Resolve dependency conflicts: keep sqlparse>=0.5.5 (mirror SQL-dump parser),
take staging's starlette==1.3.1. Regenerated uv.lock and requirements.txt
from the merged pyproject.toml (uv lock + uv export).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jirhiker jirhiker changed the title Feature(transfers) NM_Wells 1:1 mirror transfer (BDMS-945) feat(transfers): add NM_Wells 1:1 mirror transfer (BDMS-945) Jun 24, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 83fe74673e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread transfers/nmw_mirror_transfer.py Outdated
Comment thread alembic/versions/d1e2f3a4b5c6_nmw_per_well_geothermal_ogc_views.py Outdated
jirhiker and others added 8 commits June 23, 2026 19:40
Merging staging left two alembic heads: e2f3a4b5c6d7 (NMW mirror + OGC
views) and x2y3z4a5b6c7 (staging pg_cron matview refresh). Add a merge
revision so `alembic upgrade head` resolves to a single head. Fixes the
bdd-tests CI failure.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the merge revision (03ef547ed7be) with a linear history: repoint
c0d1e2f3a4b5.down_revision from the old shared base t6u7v8w9x0y1 to
staging's head x2y3z4a5b6c7. Single head e2f3a4b5c6d7; no merge point.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Cross-checked NMW_MIRROR_SPECS against the planning workbook: all 18
NM_Wells "Migrate First" tables are handled. Flagged 4 Subsurface Library
"Migrate First" tables (dst_scan, log_scanned, Well_Header, well_operators)
as out-of-scope (separate source DB) under new task T24.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses PR #740 Codex review:
- P1 (B2/V13): dump-load reload used a bare TRUNCATE, rejected by the FK
  refs to NMW_WellHeaders. Use TRUNCATE ... CASCADE (parents load before
  children per V2, so cascaded children are reloaded after).
- P2 (B3/V14): per-well geothermal views joined NMW_WellLocations directly;
  multiple OBJECTID rows per WellDataID inflated counts / emitted >1 feature
  per well. Dedup via DISTINCT ON loc CTE in all 4 views (bht, summary and
  interval heat-flow; profile already did this).

Adds two regression tests. 21 passing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Operational steps to run the Phase-1 NM_Wells 1:1 mirror (export, load,
refresh) and verify it: row-count parity, FK orphan checks, OGC view/API
checks, reversible migrations. Maps sign-off to BDMS-969/951/954 and
documents the B1/B2/B3 fixes in troubleshooting. docs/ is gitignored;
force-added.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jirhiker jirhiker merged commit 3a9fcc5 into staging Jun 24, 2026
9 checks passed
@jirhiker jirhiker deleted the BDMS-945-1-1-Mirror-Transfer branch June 24, 2026 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants